avatar

目录
ollama models

Ollama Models

ollama run hf.co/sm54/FuseO1-DeepSeekR1-QwQ-SkyT1-Flash-32B-Preview-Q4_K_M-GGUF
ollama pull qwen2.5:32b

build\Release\llama-cli.exe -m llama3-70b.gguf —n-gpu-layers 81 -n 4096 —threads 24

g:\project\llama.cpp\build\bin\Release\llama-server.exe -m llama3-70b.gguf —n-gpu-layers 81 -n 4096 —threads 24 —port 8080

Basic web UI can be accessed via browser: http://localhost:8080
Chat completion endpoint: http://localhost:8080/v1/chat/completions


评论